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» | | A geometric analysis of the global properties of the energy landscape of a minimalistic model 

O |. of a polypeptide is presented, which is based on the relation between dynamical trajectories and 

1 geodesies of a suitable manifold, whose metric is completely determined by the potential energy. We 

consider different sequences, some with a definite protein-like behavior, a unique native state and a 
folding transition, and the others undergoing a hydrophobic collapse with no tendency to a unique 
native state. The global geometry of the energy landscape appears to contain relevant information 
on the behavior of the various sequences: in particular, the fluctuations of the curvature of the 
energy landscape, measured by means of numerical simulations, clearly mark the folding transition 
. and allow to distinguish the protein-like sequences from the others. 
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Protein folding is one of the most fundamental and challenging open questions in molecular biology. Proteins are 
polymers made of amino acids and since the pioneering experiments by Anfinsen and coworkers [1| it has been known 
that the sequence of amino acids — also called the primary structure of the protein — uniquely determines its native 
| state, or tertiary structure, i.e., the compact configuration the protein assumes in physiological conditions and which 
'— makes it able to perform its biological tasks |2]. To understand how the information contained in the sequence is 
translated into the three-dimensional native structure is the core of the protein folding problem, and its solution would 
allow one to predict a protein's structure from the sole knowledge of the amino acid sequence: being the sequencing 
\q ■ of a protein much easier than experimental determination of its structure, it is easy to understand the impact such 
, a solution would have on biochemistry and molecular biology. Moreover, solving the protein folding problem would 
■ make it possible to engineer proteins which fold to any given structure (what is commonly referred to as the inverse 
folding problem), which in turn would mean a giant leap in drug design. Despite many remarkable advances in the 
■^J- • last decades, the protein folding problem is still far from a solution [2J. 

A polymer made of amino acids is referred to as a polypeptide. However, not all polypeptides are proteins: only 
a very small subset of all the possible sequences of the twenty naturally occurring amino acids have been selected by 
evolution. According to our present knowledge, all the naturally selected proteins fold to a uniquely determined native 
, state, but a generic polypeptide does not (it rather has a glassy- like behavior) . Then the following question naturally 
arises: what makes a protein different from a generic polypeptide? or, more precisely, which are the properties 
a polypeptide must have to behave like a protein, i.e., to fold into a unique native state regardless of the initial 
conditions, when the environment is the correct one? This question is, obviously, much less general than the whole 
folding problem, nonetheless if one could give it an answer it would surely help in the quest of a solution to the folding 
problem. 

This question has aroused a lot of interest in the recent years: the energy landscape picture has emerged as crucial 
in this respect. Energy landscape, or more precisely potential energy landscape, is the name commonly given to 
the graph of the potential energy of interaction between the microscopic degrees of freedom of the system Q; the 
latter is a high-dimensional surface, but one can also speak of a free energy landscape when only its projection on 
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a small set of collective variables (with a suitable average over all the other degrees of freedom) is considered [3j. 
Before having been applied to biomolccules, this concept has proven useful in the study of other complex systems, 
especially of supercooled liquids and of the glass transition [J]. The basic idea is very simple, yet powerful: if a 
system has a rugged, complex energy landscape, with many minima and valleys separated by barriers of different 
height, its dynamics will experience a variety of time scales, with oscillations in the valleys and jumps from one valley 
to another [371]. Then one can try to link special features of the behavior of the system (i.e., the presence of a glass 
transition, the separation of time scales, and so on) to special properties of the landscape, like the topography of the 
basins around minima, the energy distribution of minima and saddles connecting them and so on. Anyway, a complex 
landscape yields a complex dynamics, where the system is very likely to remain trapped in different valleys when the 
temperature is not so high. This is consistent with a glassy behavior, but a protein does not show a glassy behavior, 
it rather has relatively low frustration. This means that there must be some property of the landscape such to avoid 
too much frustration. This property is commonly referred to as the folding funnel [5j: though locally rugged, the 
low-energy part of the energy landscape is supposed to have an overall funnel shape so that most initial conditions 
are driven towards the correct native state. The dynamics must then be such as to make this happen in a reasonably 
fast and reliable way, i.e., non- native minima must be efficiently connected to the native state si that trapping in the 
wrong onfiguration si unlikely. 

However, a direct visualization of the energy landscape is impossible due to its high dimensionality, and its detailed 
properties must be inferred indirectly. A possible strategy is a local one: one searches for the minima of the landscape 
and then for the saddles connecting different minima. This is practically unfeasible for accurate all- atom potential 
energies, but may become accessible for minimalistic potentials [38|. Minimalistic models are those where the polymer 
is described at a coarse-grained level, as a chain of N beads where N is the number of amino acids; no explicit 
water molecules are considered and the solvent is taken into account only by means of effective interactions among 
the monomers. Minimalistic models can be relatively simple, yet in some cases yield very accurate results which 
compare well with experiments lalal- T he local properties of the energy landscape of minimalistic models have been 
recently studied (see e.g. Refs. |lOl I 111 [Hi \l$L [iH HH) and very interesting clues about the structure of the folding 
funnel and the differences between protein-like heteropolymers and other polymers have been found: in particular, 
it has been shown that a funnel-like structure is present also in homopolymers, but what makes a big difference is 
that in protein-like systems jumps between minima corresponding to distant configurations are much more favoured 
dynamically [l6l |. 

The above mentioned local strategy to analyze energy landscapes requires however a huge computational effort if 
one wants to obtain a good sampling. So the following question arises: is there some global property of the energy 
landscape which can be easily computed numerically as an average along dynamical trajectories and which is able to 
identify polymers having a protein- like behavior? Wc shall show in the following that such a quantity indeed exists, 
at least for the minimalistic model we considered, and that it is of a geometric nature. In particular, the fluctuations 
of a suitably defined curvature of the energy landscape clearly mark the folding transition while do not show any 
remarkable feature when the polymer undergoes a hydrophobic collapse without a preferred native state. This is at 
variance with thermodynamic global observables, like the specific heat, which show a very similar behavior in the case 
of a folding transition and of a simple hydrophobic collapse. 

The paper is organized as follows: we first describe the geometric approach to energy landscapes, then we discuss 
the model studied and our results. A final section is devoted to some comments. A short, preliminary account of a 
part of the results presented here has already been given in [l7| • 



II. GEOMETRY OF THE ENERGY LANDSCAPE 



The intuitive reason why geometric information on the landscape, and especially curvature, could be relevant to the 
problem of folding is that the dynamics on a landscape would be heavily affected by the local curvature: minima of the 
energy landscape are associated to positive curvatures and stable dynamics, while saddles involve negative curvatures, 
at least along some direction, thus implying some instability. One can reasonably expect that the arrangement and 
detailed properties of minima and saddles might reflect in some global feature of the distribution of curvatures of the 
landscape, when averaged along a typical trajectory. 

The definition of the curvature of a manifold M depends on the choice of a metric g once the couple (M, g) is 

given, a covariant derivative and a curvature tensor i?(e;, ej) can be defined; the latter measures the noncommutativity 
of the covariant derivatives in the coordinate directions and ej . A scalar measure of the curvature at any given 
point P E M is the the sectional curvature 

K(ei,ej) = (R(ei,ej)ej,ei) , (1) 
where (■, •) stands for the scalar product. At any point of an ./V-dimensional manifold there are N(N — 1) sectional 
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curvatures, whose knowledge determines the full curvature tensor at that point. One can however define some simpler 
curvatures (paying the price of losing some information): the Ricci curvature K^iei) is the sum of the AT's over the 
N — 1 directions orthogonal to ej, 

N 

K R ( ei ) = J2 K (^^) > (2) 

3=1 

and summing also on the N directions e% one gets the scalar curvature 

N N 

K = J2 K n(ei)=J2 K ( e i> e j)-> ( 3 ) 

i—l 2,3 — 1 

then, -jgMjL and N J^_ 1 ^ can be considered as average curvatures at a given point. 

Although one expects the association between minima and positive curvatures on the one side and negative cur- 
vatures along some directions and saddles on the other side to be essentially true for most choices of the metric, 
a particular choice of g among the many possible ones must be made in order to perform explicit calculations. 
The most immediate choice would probably be that of considering as our manifold M the A-dimensional surface 
z = V(q±, . . . , gjv) itself, i.e., the graph of the potential energy V as a function of the N coordinates q\, . . . , of 
the configuration space, and to define g as the metric induced on that surface by its immersion in Although 
perfectly reasonable, this choice has two drawbacks: (i) the explicit expressions for the curvatures in terms of deriva- 
tives of V are rather complicated and (ii) the link between the properties of the dynamics and the geometry is not 
very precise, i.e., one cannot prove that the geometry completely determines the dynamics and its stability. For these 
reasons we left the investigation of this particular geometry to future work and we considered a choice of (M, g) such 
that the link between geometry and dynamics is more clear. 

Let us now describe this metric and its relation to dynamics. We shall mention only the most important results, 
referring the reader to the review paper [23| for the details. 



A. Geometry and dynamics 

Let us consider a standard Hamiltonian system, with Hamiltonian function of the form 

N 2 

H = "£^ + V( qi ,...,q N ), (4) 
t-^ 2m,; 

i— 1 

where g 4 and pi are the canonically conjugated coordinates and momenta of the system, N is the number of degrees 
of freedom and rm are the masses; in the following we shall consider mi = 1 Vi. 

The trajectories of the system Q in configuration space can be seen as geodesies of a Riemannian manifold: this 
classic result is based on the variational formulation of dynamics 20]. Hamilton's principle states that the natural 
motions of a system are the extrema of the action functional 
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S= Ldt , (5) 



where L is the Lagrangian function of the system, 



L = -S ij q i e -V( qi ,...,q N ) (6) 



(summation on repeated indices is understood from now on); on the other hand, the geodesies of a Riemannian 
manifold are extrema of the length functional 



£ = / da , (7) 

where s is the arc-length defined by 



ds 2 = gij dq l dq^ , 



(8) 
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so that we can identify the geodesies with the physical trajectories of the system by choosing a suitable metric g 
linking action and length. 

The typical example of such a metric is the Jacobi metric on the TV-dimensional configuration space M, 

g ij =2[E-V(q 1 ,...,q N )]S ij , (9) 

where E is the total energy of the system. Starting from Eqs. (O one can easily show that the geodesic equations, 
i.e., the Euler-Lagrange equations for the length functional 0, whose expression in local coordinates is 

^ + r; fc ^ = o, do) 

ds 2 3k ds ds y ' 

where the T's are the Christoffel symbols [39]. become Newton's equations 

d 2 q l dV 



dt 2 



(11) 



However, the choice of such a metric is not unique. A particularly useful metric was introduced by Eisenhart in 
1929 [2l[; it is the one we used and we are going to describe in the following. 



1. Eisenhart metric and landscape curvature 



One could be tempted to identify actions with lengths by considering the configuration spacetime, i.e., Mxl, with 
local coordinates (q$, q\, gjv) where go is the time coordinate, however it can't be easily done. Eisenhart's idea was 
to further enlarge the configuration spacetime with another dimension, i.e., to consider Mxl 2 with local coordinates 
(qo, 5i, qN, qN+i), and to endow this manifold with a pseudo-Riemannian metric whose arc-length is 



ds 2 = 5 ltj dq l dq J - 2V{q){dq ) 2 + 2dq°dq N+1 . 
This is the Eisenhart metric [2lj : its metric tensor will be referred to as gE and its components are 



(12) 



f-2V{q) 
1 



9E 



\ 



1^ 




1 

0/ 



(13) 



as can be derived by Eq. (fT2"|) . 

The connection between the geodesies of this metric and the natural motions of the system is contained in a theorem 
(for a detailed statement and proof see [13]) stating that the natural motions of a Hamiltonian dynamical system are 
obtained by projecting on the configuration space-time Mxl those geodesies of (M x R 2 , c^e) whose arc-lengths are 
positive definite and affinely parametrized with time (remember that q° — t), i.e., given by: 

ds 2 = c\ dt 2 , (14) 

where c\ is an arbitrary constant. Condition (TT4")) can be equivalently cast as a condition on q N+1 , that is 



,iV+l 



Ldt 



(15) 



where C2 is another arbitrary constant. Conversely, every point of M x R 2 such that its projection on the configuration 
spacetime lies on a a physical trajectory of the system and q N+1 is given by (|15p belongs to an affinely parametrized 
geodesic of (M xR 2 ,g E )- 

The nonzero Christoffel symbols of the Eisenhart metric are 



(16) 
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so that the geodesic equations (fTOj) become 



d 2 q° 
ds 2 



= , (17) 



dV dq°dq° 

^ + r °°^^7 ~ °' (18) 

d,S 2 d S d S ~ 



and using ds = dt, i.e., condition (fTi|) with c\ = 1, we have 



A 2 

dY ay 



dt 2 dq, 
d 2 q N+1 _ _dL 
dt 2 ~ dt 



; (20) 

(21) 
(22) 



Equation (|2"0)) states that q° = t, the N equations (f2"Tj) are Newton's equations and Eq. (J^H) is the differential version 
of Equation (fl~5)) . 

The nonvanishing components of the curvature tensor are 

RoiOj = didjV ; (23) 

it can then be shown that the Ricci curvature ^ in the direction of motion, i.e., in the direction of the velocity vector 
V of the geodesic, is given by 

K R {v) = AV , (24) 

where Ay is the Laplacian of the potential V, and that the scalar curvature TZ identically vanishes. 

We note that Kr(v) is nothing but a scalar measure of the average curvature "felt" by the system during its 
evolution; we will refer to it simply as Kr dropping the dependence on the direction. Another feature of Kr is its 
very simple analytical expression which simplifies both analytical calculation and numerical estimates. It is also worth 
noticing that expression (|24[) is a very natural and intuitive measure of the curvature of the energy landscape, as it 
can be seen as a naive generalization of the curvature f"{x) of the graph of a one- variable function to the graph of 
the .ZV-dimensional function V{q\ 1 ^at): the Laplacian of the function. However, the previous discussion shows that 
it is much more than a naive measure of curvature and that it contains information on the local neighborhood of the 
dynamical trajectories. This can be exploited to gain information on the stability of the dynamics. However, we shall 
not go on along this line here, and we refer the reader to the review [23[ as well as to the monograph [24j . 

The Ricci curvature defined in Eq. (|24[) will be the quantity we shall use to characterize the geometry of the energy 
landscape. 



III. MODEL AND NUMERICAL SIMULATIONS 



Let us now describe the model whose energy landscape geometry we studied. We considered a simple model able 
to describe protein-like polymers as well as polymers with no tendency to fold; the different behaviors being selected 
upon the choice of the amino acidic sequence. The model we chose is a minimalistic model originally introduced by 
Thirumalai and coworkers [25| . In order to characterize its energy landscape geometry, we sampled the value of the 
Ricci curvature Kr defined in Eq. (JMJ) along its dynamical trajectories. We shall now describe the model and the 
details of the numerical simulations; then we shall report on the results. 



A. The model 



The Thirumalai model is a three-dimensional off-lattice model of a polypeptide which has only three different kinds 
of amino acids: polar (P), hydrophobic (H) and neutral (N). The potential energy is 



y(fi,...,fV) = Vt,ond(H - + VangularG^i ~ 

+ Vdihedral(V'i) + ^on-bonded (^1 , ■ • ■ , fiv) 



(25) 
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where 



JV-l 



Vbond 
^angular 
^dihedral 



i=l 

i=l 
fV-3 

E + cos^] +B,[1 + cos(3^)]} 



i=l 

W-3 iV 



i; 



non-bonded 



E E 

i=l j=i-\-S 



(26) 
(27) 
(28) 
(29) 



where r*j is the position vector of the i-th monomer, f^j = ri — r}, i?j is the z-th bond angle, i.e., the angle between 
fi + \ and r*i, ^ the z-th dihedral angle, that is the angle between the vectors hi — r^+i,j x ri+i.i+2 and rij+i = 
^+2,i+i x ri +2 ,i+3, fc r = 100, a = 1, = 20, $o = 105°, = and £?i = 0.2 if at least two among the residues 
i, i + 1, i + 2, i + 3 are N, At = Bi = 1.2 otherwise. As to V^, , we have 



if «, j = P, P or i, j = P, H, 



if i, j = H, H and 



Vij = 4 



(?) 
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r 



^ = 4 



(?)' 



(30) 



(31) 



(32) 



if either i or j are N [25| . 

The meaning of the different terms of the potential energy is the following. Vbond accounts for the covalent bonds 
between the a-carbons on the main protein chain, V an gular f° r the energy cost associated to deviations from the 
preferred bond angle between two neighbors on the chain, and Vdihedrai describes the contribution of non planar 
structures to the energy. These terms have very different energy scales, due to the different strengths of the bonds, 
and the dihedral term Vdihedrai strongly depends on the local amino acidic composition: the formation of non- 
planar structures (turns and so on) is favored by the presence of neutral monomers (see Fig. [1}. All the interactions 
between non-neighboring monomers along the chain, including hydrophobic effective interactions, are described by the 
Lennard- Jones term Kion-bonded- The presence of different energy scales as well as the competition between bonding 
terms (favoring planar, straight configurations) and non-bonded interactions (favoring compact configurations when 
the polymer is mostly hydrophobic) suggest that the energy landscape is very complicated and the possibilty of a 
high degree of frustration in the system: however, it is clear that these features will strongly depend on the details 
of the aminoacidic sequence. For example, a hydrophobic homopolymer is expected to be more frustrated than 
a heteropolymer, because in the homopolymer case there is a large number of compact configurations which are 
energetically equivalent (at least as long as the non-bonded contribution to the energy is considered) while with a 
heteropolymer different compact configurations have different energies. 



B. Simulations and results 



Although the identity between trajectories and projections of the geodesies of (M,gE) only holds if the dynamics 
is the Newtonian one, a Langevin dynamics, obtained by adding to the deterministic force VV a random force 
according to the fluctuation-dissipation theorem and a friction term proportional to the velocity, is a more reasonable 
model of the dynamics of a polymer in aqueous solution when the solvent degrees of freedom are not taken into 
account explicitly. Since we are interested not in the details of the time series of Kr along a particular trajectory 
but only in its statistical distribution, we may expect that also a sampling obtained using the Langevin dynamics 
gives the same information on the geometry of the landscape. To check this assumption we let the system evolve 
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tp (rad) 



FIG. 1: (Color online) Vdihedrai vs. dihedral angle -i/>i : the red (lower) curve corresponds to the case in which at least two 
out of the four beads defining the angle tpi are neutral (N), the blue (upper) curve to the other cases; the central minimum 
corresponds to a planar configuration of the four monomers defining ipi. 

with both a Newtonian dynamics (using a symplectic algorithm [29| to integrate the equations of motion) and a 
Langevin dynamics (using the same algorithm — a modified Verlet — and parameters as in Ref. [25|]) obtaining very 
similar results. This is reasonable because at equilibrium, i.e., for sufficiently long simulations, the Langevin dynamics 
(resp. Newtonian dynamics) samples the phase space according to a canonical (resp. microcanonical) distribution, 
and the two distributions are expected to be equivalent for large systems (40J, apart from small deviations due to finite 
size effects. 

In the following we shall refer only to results obtained with Langevin dynamics. 

1. Simulation details 

All the numerical results will be given in natural units defined as follows: the unit of length £ is equal to the 
equilibrium bond length (i.e., the equilibrium distance between two consecutive beads in the chain), the unit of energy 
e is the hydrophobic interaction scale, i.e., the depth of the Lennard-Jones potential well between two hydrophobic 
beads, the unit of mass m is the mass of the residues. Then the time unit becomes l\J m/e and the temperature unit 
is e/ks where ks is the Boltzmann constant. 

As already mentioned above, the integration algorithm we used to solve numerically the Langevin equations of 
motion is the modified Verlet given in Ref. [25[. The time step was At — 5 x 10 -3 . Each run was 1.7 x 10 7 time 
steps long, including equilibration. Results for a given temperature were obtained averaging over 8 different randomly 
chosen initial conditions. The mean and rms fluctuation of the observables, and in particular of the curvature of the 
landscape Kr, was estimated from the histogram of the sampled values. Statistical errors on the single run have been 
estimated dividing each run in 12 tranches, considered as independent, and then propagated to the final result. 

2. Sequences 

We considered six different sequences of "amino acids" H, P and N: four of 22 monomers, S 22 , S 22 , S 22 , S 22 , and two 
sequences of 46 monomers, S^ 6 , S^ e . The six sequences are listed in Table HI Sequences S 22 and S^ 6 had already been 
identified as good folders [2j| [3(| and our simulations confirmed this finding: below a given temperature S 22 (resp. 
St. 6 ) always reached the same /3-sheet-like structure (resp. /3-barrel-like structure). A plot of the number of native 
contacts N n (see Sec. IIII B~3l for the definition) as a function of the temperature for these two proteinlike sequences is 
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name 


sequence 


Sf 


PH 9 (NP) 2 NHPH 3 PH 


a 22 


PHNPH 3 NHNH 4 (PH2)2PH 


s? 2 


P 4 H 5 NHN 2 H 6 P 3 


SP 


H22 


sf 


P(HP) 5 N3H 9 N3(HP)4N 3 H9 


q46 
Oh 


H46 



TABLE I: The six sequences considered. (X) y means that X is repeated y times. 

reported in Fig. [21 Homopolymers S^ 2 and S^ 6 , on the other hand, showed a hydrophobic collapse but no tendency 
to reach a particular configuration in the collapsed phase, as expected. Sequence S^ 2 (which has the same overall 
composition of S 22 rearranged in a different sequence) behaved as a bad folder and did not reach a unique native 
state, while S 22 was constructed by us to show a somehow intermediate behavior between good and bad folders: it 
always forms the same structure involving the middle of the sequence, while the beginning and the end of the chain 
fluctuate also at low temperature. 

3. Estimate of collapse and folding temperatures 

The hydrophobic collapse temperature Tg and the folding temperature Tf (the latter only for the two protein-like 
sequences S 22 and Sg 6 and for the "intermediate" sequence Sp) were estimated in a standard way as follows. 

Tg has been estimated as the temperature where the specifc heat shows a peak. The rationale for this definition is 
that the hydrophobic collapse of a polymer becomes a thermodynamic phase transition, commonly referred to as 9 
transition, in the thermodynamic limit of an infinite number of monomers 26]; at the critical temperature (usually 
denoted by Tg, notation that we adopt also for our finite size case) the specific heat diverges, and the specific heat peak 
we observed in our systems is nothing but the precursor of this divergence. One could define a collapse temperature 
also by monitoring quantities directly related to the collapse itself, as the gyration radius or the end-to-end distance 
of the polymer; we checked that the estimates of Tg obtained this way were perfectly compatible with those based on 
the specific heat. 

At variance with the collapse transition, the folding transition does not have any corresponding "true" phase 
transition in the thermodynamic limit, simply because no thermodynamic limit exists for a protein: we cannot increase 
the number of amino acids of a protein beyond any limit whithout destroying its protein-like behavior [27L [28[ . From 
the point of view of statistical physics, protein folding may be considered a genuine finite-size phenomenon. Any 
estimate of the folding temperature Tf will then be somehow fuzzy. We adopted one of the various possible protocols. 
First, we defined the fraction N n of native contacts in any configuration as 

N N 

Tin 

c i=l j=i+2 

where ry is the distance between the z-th and j-th residues, r™- the distance between the same residues in the reference 

(native) configuration, n c — Y^=i Y^j=i+2 ®(^s — r ?j) the total number of contacts in the native configuration, and d s 
the threshold distance below which two beads are considered in contact: the value of d s is the mean distance between 
beads in contact in the native configuration. Then, TV is defined as the temperature of the inflection point in the 
curve N n (T). These curves are shown in Fig. The estimated Tf and Tg for the six sequences are given in Table HT1 
Errors on Tg (resp. TV) have been obtained by estimating the interval of the position of the peak in cv(T) (resp. in 
the derivative of N n (T)) compatible with the errors on the numerical data for cy (resp. N n ). We note that, for the 
two sequences where folding occurs, Tf m Tg (indeed Tf — Tg within the estimated errors). This is not surprising at 
all since in general Tf <Tg, and one expects the folding to be more efficient if Tf/Tg sa 1 because in this situation the 
polymer approaches the native state immediately when the temperature is lowered below Tg, reducing the possibility 
of misfolding in non- native compact configurations (l3j . 
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sequence 


T e 


Tf 


Sf 


0.65 ±0.05 


0.55 ±0.1 


n22 


0.55 ±0.15 


none 


s 22 


0.75 ±0.1 


0.7 ±0.2° 


k 


0.65 ±0.05 


none 




0.65 ±0.025 


0.65 ± 0.05 


Q 46 


0.70 ± 0.025 


none 



a For this sequence a "quasi-folding" transition where only half of the sequence folds is detected (see text). 

TABLE II: The collapse (Te) and folding (Tf) temperatures for the six sequences considered. 



4- Thermodynamic and geometric observables 



As to standard thermodynamic observables, all the sequences showed very similar behaviors: to give an example, 
in Fig. [3] we compare the specific heat cy of the homopolymer S 22 and of the good folder S 22 : both exhibit a peak at 
the transition, and on the sole basis of this picture it would be hard to discriminate between a simple hydrophobic 
collapse and a folding. The same happens in the case of the longer homopolymer S^ 6 and the good folder S^ 6 (Fig. 
2J). On the other hand, a dramatic difference between the homopolymers and the good folders shows up if we 
consider the geometric properties of the landscape. In particular, a lot of information appears to be encoded in the 
fluctuations of the Ricci curvature Kr, i.e., of the Laplacian of the potential energy — see Eq. (|24|) . We defined a 
relative adimensional curvature fluctuation a as 



£ ((K%) t - (K R )\ 



7j{K R )t 



(34) 



where (-)t stands for a time average: in Fig. [5] we plot a as a function of the temperature T for the homopolymer S 22 
and for the good folder S 22 . A clear peak shows up in the case of the good folder, close to the folding temperature 
Tf below which the system is mostly in the native state, while no particular mark of the hydrophobic collapse can be 
seen in the case of the homopolymer. A similar situation happens for the longer sequences, the good folder S^ 6 and 
the homopolymer S^ 6 (Fig. [6]). 

The relative curvature fluctuation a of the energy landscape appears then to be a good marker of the presence of 
a folding transition, at least in the simple model considered here. We stress that the comparison between sequences 
of length 22 and 46 clearly indicates that the peak in a is really related to the folding and not to the hydrophobic 
collapse, because a for the long homopolymer S^ 6 is even smoother than in the case of the short homopolymer S 22 , 
at variance with the specific heat which develops a sharper peak, consistently with the fact that the system is closer 
to the situation where a thermodynamic ^-transition exists. 
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FIG. 2: (Color online) (N n ) vs. temperature T for the two good folders: Sg 2 (left), Sg 6 (right). The curves are a guide to the 
eye. 
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FIG. 4: (Color online) As in Fig. |3] for the homopolymer S^ 6 (left) and the good folder 5*g 6 (right). 



In the case of the longer sequence (Fig. [6]) the peak in the curvature fluctuations appears more structured: it seems 
to be the superposition of two peaks, one at Tf rs Tg and another, even higher, at T ~ 0.85 where the specific heat 
shows a "shoulder" . This may be related to the fact that the good folder Sg 6 seems to reach its native state — a 
/3-barrel made of two /3-sheets — in two steps: first, at T ~ 0.85, the two /3-sheets form but still are free to move 
the one respect to the other, then, at Tf, the two sheets fold into the native /3-barrel. This interpretation of the 
folding process as composed of two steps is corroborated by the results on the gyration radius as a function of T (data 
not shown). The curvature fluctuations seem then to indicate both steps of folding with two separate peaks which 
superimpose on each other to give the structure observed in Fig. [5J 

As to the other sequences, for the bad folder S 22 , a(T) is not as smooth as for the homopolymers, but only a very 
weak signal is found at T ~ 0.4, lower than Tg; below this temperature the systems seems to behave as a glass|411. For 
the "intermediate" sequence Sf 2 a peak is present at the "quasi-folding" temperature, although considerably broader 
than in the case of S'g 2 (see Fig. [7J)- 



C. A closer look at the curvature of the landscape 



Where does the peak in cr(T) near Tt come from? To find clues towards an answer we may have a closer look at 
the properties of the curvature of the energy landscape. Looking at the time series of the curvature Kn(t) for the 
two sequences S'g 2 and S 22 , i.e., the good folder and the homopolymer of length 22, respectively, a clear difference 
shows up when we consider data sampled close to the transition temperatures, Tf and Tg, respectively (Fig. [5]). In 
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FIG. 5: (Color online) Relative curvature fluctuation a vs. temperature T for the homopolymer S 22 (left) and for the good 
folder Sg 2 (right). The curves are a guide to the eye. 
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FIG. 6: (Color online) As in Fig. [5] for the homopolymer S^' and the proteinlike sequence Sg 



the homopolymer, Kji{t) oscillates in an apparently random fashion around a constant mean value; the data reported 
in Fig. [8] (left panel) have been taken at T w Tg, but — in the case of the homopolymer — they are not qualitatively 
different from those taken at any other temperature. On the contrary, the curvature signal of the protein- like sequence 
near Tf (right panel in Fig. [5]) is much more structured: there are several time windows where the curvature oscillates 
around a mean value which is considerably lower than the global one, and in these windows also the amplitude of the 
fluctuations is smaller. This effect is even more visible at the temperaure of the peak in the curvature fluctuations 
tr(T) (which is slightly larger than the estimated Tf, although within the estimated errorbar), as shown in Fig. [51 and 
leads to an asymmetric distribution of the values of Kr, resulting in "anomalously large" fluctuations which are at 
the origin of the peak of <r(T) close to Tf. Histograms of the distributions of Kr for the homopolymer at T w Tg 
and for the sequence S% 2 at the temperature of the peak in the fluctuations, T ~ 0.68, are shown in Fig. [TUJ Also the 
distribution of curvatures for the homopolymer is asymmetric, but the asymmetry is considerably lower than in the 
good folder case. 

We interpret these data as direct indications of the presence of two macroregions in the energy landscape, one 
corresponding to the native state and the other — charatcterized by a smaller average curvature — corresponding to 
the unfolded state. Then the dynamics is effectively two-state: close to Tf, the system often jumps between the two 
basins and this explains the behavior of the observed time series of Kr. This interpretation is supported also by the 
following observations. First, the average curvature is a decreasing function of T (Fig. 1 1 1 j) : second, comparing the 
instantaneous values of the curvature and of the number of native contacts (defined in Eq. I33[) we clearly see that the 
time windows where the curvature is lower correspond to zero native contacts, thus indicating that the system is in 
the unfolded state (Fig. Eg) . 
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FIG. 7: (Color online) a(T) for the sequence (on the left) and for the sequence S 22 (on the right). The curves are a guide 
to the eye. 




11000 



10000 - 



9000 



8000 



7000 



6000 



2000 4000 6000 8000 10000 12000 14000 

t 



5000 




2000 4000 6000 8000 10000 12000 14000 
t 



FIG. 8: Kn(t) for the homopolymer S 22 at T w Tg (on the left) and for the protein-like sequence S 22 at T » Tj (on the right). 
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FIG. 9: Knit) for the protein-like sequence S 22 at T ~ 0.68, i.e., at the temperature of the peak in the relative curvature 
fluctuations a(T). 
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FIG. 10: (Color online) Normalized distributions of Kr for the homopolymer S^ 2 at T ~ Te (broad histogram) and for the 
protein-like sequence S 122 at T ~ 0.68 (peaked histogram). The values in the vertical axis must be multiplied by 10 -7 . The 
skewness of the distributions is 0.6 for the homopolymer case and 1.4 for the protein-like case. 
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FIG. 11: (Color online) Average curvature (per degree of freedom) (Kr)/N for the protein-like sequence Sg 2 as a function of 
T. 

This suggests that an effective description of this system in terms of a coarse-grained energy landscape should be 
possible: work is in progress along this direction [32| . 



IV. CONCLUDING REMARKS 



Studying six different sequences of a minimalistic model of a protein, we have shown that a geometric quantity 
which measures the amplitude of the curvature fluctuations of the energy landscape, cr, when plotted as a function 
of the temperature T shows a dramatically different behavior when the system undergoes a folding transition with 
respect to when only a hydrophobic collapse is present; <r(T) can thus be used to mark the folding transition and to 
identify good folders, within the model considered here. 

It must be stressed that no knowledge of the native state is necessary to define a, and that it can be computed 
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with the same computational effort needed to obtain the specific heat and other thermodynamic observables. This 
means that using e.g. reweighting histogram techniques one can reliably estimate the behavior of a as a function of T 
using few simulation runs at properly chosen temperatures, if not a single one. Hence, if tested successfully on other, 
maybe more refined models of proteins, the calculation of the curvature fluctuations might prove a useful tool in the 
search of protein-like sequences. Preliminary results on more refined minimalistic models 31| completely confirm the 
scenario presented here. 

In case numerical evidence accumulates and/or general theoretical arguments become available in favor of our 
suggestion that the peak in the curvature fluctuations is a generic marker of a protein-like behavior, then this method 
could provide, for example, a fast and reasonably cheap numerical tool to make a preliminary discrimination between 
sequences that are likely to be protein-like and the others. This may be useful not only when trying to understand 
the general features of energy landscapes of minimalistic models, but also for tasks of more applicative interest like 
assessing the relevance of point mutations in a sequence. 

We have shown indications that the presence of a peak in the curvature fluctuations when the system undergoes a 
folding is a consequence of the effective two-state dynamics of this system close to the folding transition. On the basis 
of the sole results presented here such an interpretation can not be considered as definitive; nonetheless, it is confirmed 
by p reliminary results on an effective model of the system studied here, which will be discussed in a forthcoming paper 
|32j |. It must also be noticed that higher curvature fluctuations imply in general a higher degree of instability of the 
dynamics (see [Hj]), as expected near Tj where the system has essentially the same probabilty of being in two very 
different states, folded or unfolded. A deeper investigation of the instability properties of the dynamics close to the 
folding transition would probably yield more interesting information. 

Apart from the possible applications of cr(T) as a diagnostic tool, and even more as giving indications about 
the global structure of the energy landscape, the results we have presented here may also be interesting because 
open a connection between the folding transition and symmetry-breaking phase transition. The behavior of cr(T) 
observed here for the good folders is remarkably close to that exhibited by finite systems undergoing a symmetry- 
breaking phase transition in the thermodynamic limit 33], while the case of the homopolymers is similar to that of 
a Berezinskij-Kosterlitz-Thouless transition. This suggests that the folding of a proteinlike heteropolymer does share 
some features of "true" symmetry-breaking phase transitions — at least those features that show up already in finite 
systems — although no singularity in the thermodynamic limit occurs, because proteins are intrinsically finite objects 
[27L [28j . The behavior of <j(T) in systems with thermodynamic phase transitions has been interpreted in terms of 
topological changes of the manifolds where the dynamics of the system "lives" [34], HH (see also [2^, [24[ and (3(| for 
a review). Work is in progress to understand whether such a topological interpretation can be extended also to the 
case of the folding transition, although it has no infinite-system, thermodynamic limit counterpart. 
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